The Hybrid Approach for Handling and Detecting Outliers from Dynamic Data Stream
نویسنده
چکیده
The Outlier detection is currently area of active research in data set mining community. In this article we propose hybrid approach to capture outliers in dynamic data stream. We apply k-mean algorithm which Partition the data set into number of chunks or clusters. Each chunk contains set of data. Once cluster are formed, centroid of each cluster are calculated. The points which are lying near the centroid of the cluster are not probable candidate outlier and we can prune out such points from each cluster. Next distance based technique is used to find the distance from centroid to candidate outlier. For that threshold value is set. If this distance is greater than threshold value then it will declare as outlier otherwise as a real object. In proposed approach, two techniques are combining to efficiently find the outlier from the data set. This hybrid approach takes less computational cost. Proposed algorithm efficiently prune of the safe cells and save huge number of extra calculations.
منابع مشابه
Dynamic DEA: A Hybrid Measure Approach
In the real world applications, there are some situations where inputs and outputs are time-dependent and are affected during the production periods. Capital stock can be seen as an effective instance in such occasions. In order to handling long-time planning, dynamic structure was proposed in efficiency evaluation. In this framework, there are some of the inputs and outputs change proportiona...
متن کاملIdentification of outliers types in multivariate time series using genetic algorithm
Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...
متن کاملDetecting Outliers in Exponentiated Pareto Distribution
In this paper, we use two statistics for detecting outliers in exponentiated Paretodistribution. These statistics are the extension of the statistics for detecting outliers inexponential and gamma distributions. In fact, we compare the power of our test statisticsbased on the simulation study and identify the better test statistic for detecting outliers inexponentiated Pareto distribution. At t...
متن کاملA Tunned-parameter Hybrid Algorithm for Dynamic Facility Layout Problem with Budget Constraint using GA and SAA
A facility layout problem is concerned with determining the best position of departments, cells, or machines on the plant. An efficient layout contributes to the overall efficiency of operations. It’s been proved that, when system characteristics change, it can cause a significant increase in material handling cost. Consequently, the efficiency of the current layout decreases or is lost and it ...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کامل